DOCOHENT HESOHE 



ED 194 636 



TH BOO 76B 



TITLE 



POB_DATE 
NOTE 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIEFS 



Ehgelhardr' Geo 



n To 



Jr . 

Ah Introduction Ifo Sasch Measurement ^iid It s 
A'pplicatibn to Test Equating in the Comprehensive 
Assessment Program. 
Hay 80 > 

27p. :. Paper presented at the Annual Meetincf of the 
Northern Illincis Association for Educational • 
Bes.earchr Evaluation and Development (t*thr 
Bloomingdaler IL, May 9, 1980).-* 

MF0t/PC02 Plus Postage. 

♦ Achievement Tests : Difficulty Level : Elementary 
Secondary Education : *Eguated Scores :*Latent Trait 
Theory; Mathematical Modeis^_Reading Tests; Student 
Evaluation^ Test iteDas; *Test_Th^ , 
♦Comprehensive Assesscaent Program; *Easch Model; 
Onidimensionai Scaling; Vertical Equating. 



ABSTRACT _ 

The Easch model is de 
which meets the five criteria _ that _ ch 
objective measurements of an ihdividu 
test items used. The criteria are; M 
te independent of particular hormihg 
individuals must be independent of pa 
measuring:^ ?3) test items must measur 
ability; l^) a mbre able individual m 
success with an item than a less- able 
indivi-dual must have a tetter chance 
on a difficult item. To illustrate th 
vertical equatingr equal interval sea 
applied tc the Scott^ Foresman Cocapre 
ICAP) r a coordinated series of tests 
students' ^educational qrowth. (Author 



scribed as a latent. trait model 
aracterize reasonable and _ _ 
al' s_ ability indefendent of the. 
) calibration of test items must 
groups;- i2) measurement of 
rticular items used for 
e a single underlying trait or ' 
ust have a .better chance of 

individual; and (5) an 
cf success on an easy^^item^than 
e use of the Rasch model for 
les (EI?) were: developed_ahd 
hensive Assessnaent Program 
and measur eis for evaluating 
/m) 



4 

a(c ~a(c atcate ifi ate ate afe 9|i 9^ 4> s|> afc afc 9«c 9«c ♦ 4^ 

aeproductions supplied by EDRS are the' best that can be made * 
* : frpm_the original document. * 

a»ea»ea»ea»e ateaieaKat aje sjtaK* ajeaj. aieaieajealtaKaKalea^ 



EKLC 



. . _ . • 

AN INTRODUCTION TO RASCIi MEASURQiaiT 



"PERMISSION TOHEPRbbUCE THjS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



ITS APPtlCATiON TO TEST EQUATING 
In The 

- - 

COMPREHENSIVE ASSESSMENT PROGRAM 



George Engelhard, Jr; 
University of Chicago 



U S_DEPA_RTMENT_OF HEi(^I.TH. 
EDUCATION & WELFARE 
NAtiONAL iNStktutE OF 
EDUCATION 

JM'S_ DOCUMENT HAS _8EEN_ REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR XJRGANJZAtk/N ORiXtlN- 
ATrNG IT POINTS OP VIEW OR OPINIONS 
STAtEO 56 NOT IStECESSARlLY REPRE* 
SENT OPPICIAU NATIONAL I^STITDTE OF 
EDUCATION POSITION OR POLICY 



Paper presented at the Northern lilindiis Assocl^t±on for Educational 
^ Research* Evaluation and Development meeting in Bloomingdale, Illinois, 

^ May, 1980. - o 



ERIC 



Abstract 



Thia purpose of this paper Is to provide a basic introduction to the 
Rasch model and to illustrate its use for equating psychological and 
educational tests. The data ^ed for the equating example was taken 
from a set of stMLdardlzed readlng tests are a j>art of the , ^ 

Achievement Series of the Comprehensive Aesesgme nt Prog ram (Scott, 
Foresman and Company^ 1980.) 



... 

M' 



AN INTRODUCTION TO RASCH MEASUREMENT AND 
ITS APPLICATION TO TEST EQUATING 



INTRO DUCTION . ' 

One of the laajor probiems encountered , in educational measurjBsineht 

=^ . . — 
is the equating of person measurements obtained on different tests. This 

_ _ _ ^ _ 

problem oc^curs whenever the variable of Interest is represented by a range 

of iteoL difficulties which go beyond the ability of any. one group of indi- 

viduals to attempt. For example^ as educators w^may be interested in tracing 

an individual's growth over the elementary school years. Any singl% test that 

we might use would be much too difficult for first graders and much too ^ 

,T - 

, .._ . ;_ o; 

simple for eighth graders, vlf we use multiple tests, composed of items 

whose difficulties %re appropriate for each person's level of ability, then ^ 

' _ .. . _ . . . _■ . _ __ ____ ____ ; _ 

we^are f^ced with the problem of determining the equivalence or compara- 
bility of measures ^obtained from several different measuring instruments, 
A solution to the equating problem can be found, if we can create several *^ 

; . z 

tests composed of items calibrated onto a single scale which represents a 
unldimensional constrjict (e;g,, reading ability) and spans the time period 
over viiich we wish to measure growth, ^^n^ order acdbmplish this goal^ we 
nee'd a method for equating tests and linking items together. These linked 
items can be used to represent the latent construct or variable of interest 
on which we wish to measure ah individual's growth and cl^gnge. 

^ This problem was recognized by Thorndike in the early 1920s. 



With the development of group tests and tests 
for use with higher levels of intelligence, it 
is becoming more and more necessary to transmute 
a score obtained with one test into the score 
that is equivalent to it in some other test. 

tfft - - - 

(Thorndike, 1922, p. 29) 



ERIC 



- 2 - _ 

Various nfethcjds have been proposed as solutions to the equating problem. - 

Thdrndike "transmuted" scores using his probable error method* of scaling 

(Thomdike, 1922; frabue, 1916). Thurstone in a series of articles in the 
••v - _ _ _ . ' 5 : " _ . 

i920s 'described his absolute scaling method which he proposed as a solution 

to^the equating problem (Thurstone ^ 1925, 1927 * 1928). More recently, 

latent trait measurement theory has been recomxnended as a source of solutions 

to the "intractable" problem of equating (Lord, 1977; Marco, 1977;^«'Rasch, ' 

1960; Wright, 1967; Wright and Stone, 1979). 

in his extetj.sive discussi<Jri^of equating", Angof f (,1971) listed vhat 

he cohJsidered two "restrfctions" or what may be better thbu^t of as reason- 

__ ■_ 

able assumptions and conditioi^ nec^^ssary in order to equate tests. These 

* - S 

conditions are: r ^- 

1. that the two instruments (tests* items) in 
question be measures of^the Sai^ character- 
istics in the s^e sense that degrees Fahreh- 
^ heit and centigrade^ for example ^ are Both 

^ ' tmits of temperature^ inches and centimeters 

are-'Bpth units. of lengthy etQ. 

(unidimensibhality ^ohditibhy . ' ^ . 

^ 2. that^ in order to be truly a transformation 

'ft of systems of units, the conversion must be ^* 

unique, except for the random error associa- : 
ted with the unreliability of the_ data, and ' 
the niethbd used for determining^ the trans- 
formation ; the resulting _£on^rsion^^ho^ 
, t»a independent of the- individuals" from whom 
the data were drawn to develop the conversion 
and should be freely applicable to all^ 

1 _ 

(sample-free condition) 

M . . * ^- .-- _ _- ■ _^ - -^^ 

The first condition for acceptable equating involves the unidimensibhality of 

the measures to be equated, whlie the second condition iinplies a sample-free 

^_ " '* 

^^j)rocedure f or equating. Both of ^these conditions are necessary in order to 
rea^i^e the advSitages of equated tests. All of the previously propM^sed 

^ . , '. 



procedures can meet or approximate the first coriditibh of unidimensidtiallty . 
Noiie of the methods proposed prior to the development of latent trait measure- 
ment theory meet the sample-free condition and only one set of latent trait 

models — Rasch -measurement models — offers reasonable solutions to the problem 

•« * . 

: . ■ . . . : _ ^ _ _ \ ______ ^ " ; 

of sample-free equating (Engelhard, 1980); ^ 



The purpose of this p^per is to provide a basic introduction to the . 

V . J _l_ ^r-__ V 

Rasc^ model and to illustrate its tjse for equating educational tests.. iThe 

data used for the equating example was taken from a preliminary set of 

______ J _ _ ^ _ _ _ _ ' . ' . . - — 

reading tests which are part of the Comprehensive Assessment Program (Scott, 



Foresman and Company, 1986). :^ 
INTRODUCTION TO ^ RASCH MODEL 

^ During the 1950s ^ Georg Rasch conducted the basi.<: psychometric work 

which led" to the publication in 1960 of his book, Probabilis t i c Mode ls- for 

Sbine lartelligence and Attainment Tests . The ideas and met^hods presented In 

_ _ _ _■ " * ■■ " _ 

this book represent some of the most innovative and useful wprk in psycho- 

. . . .... _ _ _ • __ :: • ^ __ _ _. -^j ' , . .... 

metrics, since Thurstone's work in the 1920s, in fact, Rasch's work repre- 

._ . _ ^ - ' ^ „ ,. ^ 

sents an almost totally new approach to psychometrlcs. 

In traditional or ci^assical psj^ometrlcs , the properties of a test' 

are defined in terns of variations within some specif ied population^pf 

people. As a consequence, the properties of ^he test^^^.g., the "reliability 

coefficient, are. not speci^fic to the test itself ^ but will v&ry depending 

_- ___ _ _ ^_ _ - _ _ _ _ - : ^ . . ■ 

on the population chosen. Similarly ^ the measurement of a person on the 

variable of interest will depend on wKich^ items are used, in traSitional 

approaches to person measurement, the ability estimate .depends riot only bri 

which items are used, but also on the group of people with which the "Person 



. , ..' " . - 4" - • ; " ' • ""■ 

' 3 ' , 

is compared. As Wright has pointed out ^ 

if all of a specified! set of items have been 

tried by a child ytm wish to measure ^ then * * ^ 

you can obtain his ^percentile position among 
^, whatever groups of children were used to stan- , ' 

dardize the test. But how dp you fhterpret 
this feeasure beyond th^' confines of that set 
of Items and those groups af children? . ^ 
Change the children ari^ you have a new yard- 
' ttick. : ^ ^ 

; J. - -^1 1 ----- - 

Change /the items and you have a new yaifd- 

stick again. Each collection of_items mea- 
sures an a±»iiity of its own. Each measure . 
depends for its meaning on its own family 
of test takers. How can we' make objective 
mental measurements and build a science _of ■ \ 

mental development when w e wo iifc with- tubber 
" ^7 yardsticks? 

. (Wright, 1967, p. 86) ^ ^ 

Hfe use of Rasch measurement models provic^s a reasonable solution 

■'■ ■ ' .. . ' ■ ' '.y . ' - 

_ _ ^ _ _ _ _ _ _ _ _ _ _ _ o 

to the problem of "rubber yardsticks"*by providing estimates for intrinsic 
properties of tests aixd items ^^hich are independent -.of the grbxjp that hap-r 

^ • 

pens to be tJised tojjcalibrate the items. Thi^ called person-free item 

_- - ^ - - _ _' _ ' .■_______ ^ _ 1 

calibration. It ^Iso yields^ estimates of a person's abiii^ which are in- 
dependent of the test items used. This leads to the' possibility of item- 

' _ ■ « ,fi [ ■ _______ . - 

•free person measuremeht. Of course this does not mean that we can measure ' 

peddle without items, but it does mean that once items are calibrate^ 
through the use of the Rasch model and assigned a position on the latent6> : 

variable of interest , •then any set of items can be used to obtain an esti-, J) 

- . .- ^ 

„mate of a.per&onJSs ability. These two consequences—persbn-free item cali- 
bration and item-free person measuremftit — are necesisary in order io have 

^ . ^ ^ 

objective measures. .£>,■■ ^» " ' ^ ^ ^ 



* k 



0 



- 5 - 



In order to obtain reason^ie and objective measurement, the ^ 



measurement model utilized -md^t satisfy at least the following five con- 
ditions. These conditions are that: 

O I' ' 

1. the calibration of test items must be in- 

. dependent of the particular individuals used 
V for thexalib ration • - ^ 

2. the measurement of individuals must be in- _ 
dependent of the particularT^itcuui LliaL hsp^ ' 
pen to be used for the measuring. 

3. ^t^the test items must be measuring a single 

ft- underlying trait or ability. • 

4. a more able individual must always have 
a better chance of success on an item 

than a less able individual. ** • 



5. any individual must have a better chance 
of success on an easy item than a more 
difficult item. 
"48 ■ ^ ^ , ,. " - 

The Rasch model is a latent trait model that has been proposed for 
' _ ■ ....... ^ _ '_ _ 

^person measurement that meets these five conditions. Basically, latent trait 

,^ models are idea or inventions that attempt to specify what happens when a * • 

person tries an^tem. (See Hambleton and Cook (1977) for a general intro- 

ft - 

duction to latieht trait models.) . - . * 

Of all the latent trait models » Rasch measurement models have the 
fewest ingredients, one ability parameter, 3-, for each person n ^d biie 
difficulty parameter, 6^, for each. item. These parameters repres^t the-f-^ 

' • ' - 

position or location of persons and items on the latent variable; For 
example, if the latent' variable is reading ability, we develop and choose >a 
set of items to represent t^i^' variable. These items are then given to a 
group of people and their locations are determined through the application o/ 



the Rasch tnodeii The locations of the people bri 'the latent ^^variable^ 
reading ability, are given by the ability estimates, while the locations of 
the items gLven by the difficulty estimates. This is illustrated in ' 
sram'l. 



Plagrim 1. Defining a variable. 



Items 
EASY 



-i- 

6s 




Person measure 



55 



HIGH 



HARD 



In Diagram i, the line represents the latent variable called reading ability. 
Five items have been chosen to reprf:i-ent this construct and their difficulties 

_ ' ' i " - • ■ 

which locate them on the latent variable are shqwn below the line .(6^ to 

The items range from ^aasy on the l^fc to more diffi^lt on the right. Person 

measulrements are shown above the line and In this jcase there is one person 

measure. This person correctly answered items; 1 to 3 and Incorrectly answered 

- . - ^ 

items 4 and 5. -This person's, score would be 3 and this value c^h be used to 

locate the person on the latent variable by providing an estimate of reading ' 

ability. 

The ability parameters and , difficulty' parai^eters are combined in order 



to represent one latent dimension by forming their difference (& -6 ). 

- n 1 

. ^. . • _ ' 

This difference governs the probability of what happens when person n attempts 



; - 7 - . - 

-a • ' - 

item !• — The basic data whiclt tiavcj in any testing situation is 'a matrix 
of b's and i's which represent each individual/ s failure (0) or. success (1) 

-----11---:^-^-^- 

on each item."^ This is illustrated Diagram 2* 

Diagram 2. ^ The essential conditions causing a response. 

— — c ^ . 

j Person aSiiity - - 



6. 



difficulty 



X- . 
ni 



0,1 



^Response of person n 
to item i. 



"The mathematical model used to express this relationship is shown in 
Diagram 3. " • 



Diagram "3. Mathematical formulation of the Rasch model with* two resifonse 
• ' categories^ 



Prfx^^ = 0,1 [ e„,6^.} = 



n' i 



t± + exp (g^ - 6^)1 



10 



ERIC 



- o - 



The^probabtiity of olJaerving a correct "Response (i).\i5^an ±ncorr«5at response (o) 
ffl^ person h on item i is a. function of the difference betwein the person*^ 
ability (B^)and the item*s difficulty the relationship represented by * 

the Rasch model between this diffe»ence'(g -6. ) and probability of success 

^ n 1 - ■ : ' 

oh ah item can be illustrated with an Item 'characteristic curve or" response 

"jf — - . . . - ^ 3 _ ~_ 

curve ' iSi^ wKich the item difficulty remains constant, while person ability .^ 
^varies. (See DiagrAn 4). If the person^s ability equals the difficulty of"" 

the Item, then the person has a 50% chance' of Success dh that Item. In 

: . : _ - _ ^ ^ . ' ■_ . .. 

other words*, the peri^dh can be Expected to succeed^ half the time on this 
^ kind of Item, aitd conversely to ^fail half the tiin^. If the person* 3^ 
ability exceeds the item^s difficulty, then a person has a better than 
56%' chance of-^success on the item; if the ^tem^s difficulty exceeds the 
person* 9 ability to answer the item correctly, then the person has a less 
than 50% -chance^ of success"! ^ c ' ^ 



I diagram 4 •_ Response Curve. 



Probability 
of a porrect 
response 




Pr.{x=l} < .5: I Pr{x=l}-> .5 

11 



i - 9 - . 

TEST EQUATING WITH THE RASCH MODEL ; 

As pointed but earlier » various procedures have beeti proposed for test 

equating^ but the" only method which meets all the conditions flec^ssary for _ 

» * > I. „ - * 

4Sbjee^ lv& e quatlag Is this iasch model. Our goal In test equating is to 
''step beyond the specific Items contained ,ln separate tests in order to 
\ get information on the latent trait or anobservable: variable which is of 
interest. Since 2x0 individual can handle the f uil rfnge of difficulties, 
it^is necessary to translate the measures obtained on diff£?ient tests 
into one combn metric on a unidimens'ional scale th.it represents the 'latent 
variable. For ex^pie, sup:^se we are ihterestted in measuring the change 
and growth" in reading^ ability of students from grade 3 to grade 4. If the 
students were given exactly the same testi many of the students in the 
begin^ng^of grade 3 would experience frustration when attempting ItexDS 
appropriate for .them at the exid of grade 4; these items would be obviously 
tod dllflcuit and thus inappropriate 'for a grade 3. student. Conversely, 
when these 'students are Iti grade 4, they might become bored with items 
appropriate for grade 3 students and now obviously too jsasy. ^In addition 
to' these extraneous influeifces on the measuring situation, there are prob- 
lems, such as memory effects, that atise when children are retested using 
the s^e tests. The well known fadt that ability estimates are most accurate 
when they asei based on items, of appropriate nlfficulty for the student must 
^so be cohsidere'd. Otie approach is to link several tests together with 
a subset of carefully chosen' items^ so that the students are taking tests 
which are appropriati^ for their ability which* will minimize extraneous in- 



-V: 



ERIC 



fluehces bh the measuring situation atid provide more accurate estimated of 

an individual's ability or location bh the latent trait. This link of 

- • ^ ■ 

CbtDiDdn items can be displayed as shown in Diagram 5. 

Diagram 5. CommDn item link. 




Linking Constant 



(common items) 




Figure 2 illustrates this type of display with the actual linking cbnstants 

in pbsitioh for several fbrms which measure reading ability over a 6 year 

period. - 

The basic logic behind the linking of tests through common items can 

be illustrated using the following table, which is based on hypothetical 
data. " - : 



«b 
Mb' 



Forma 



Grade 3 Grade 4: 



.5 ia) -.5 ih) 



1.0 



0.0 (c) 0.0 (d) 



1.0 «- 



0.0 (c') 1.0 (d') 




Linking- Constant 



o * — 

- 11 - « 

Suppose students in grades 3 and 4 each have taken separate test with 10 
cdimndn items • The average difficulty estimates 0^) of these 10 items 
for each group is shown iri cells (a) and .(h). in order to compute these 
estimates, separate calibrations are conducted on each test using the 
Rasch model. (See Wright and Mead (1976) for a description of the call- 
• bration procedure and a computer program that can be used to obtain these.; 
estimates.) The next step is to take the two Independent difficulty 
estimates for the 10 com^n items and compute the two mean difficulties 
which were obtained through the separate calibrations using each grade. 
The average ability estimates (M^) for each grade are ciBntered at zero in 
the usual way (Wright and Stone^ 1979) . Since the items are the same, 
they should represent the same point arid location on the latent trait 
scale. Iri other words, the difficulty estimates for the common items should 
ideally be the same whether they are determined with grade 3 students or 
grade 4 students, in order to approximate this equality, we take the 
average difference between the independent difficulty estimates as a ^ 
linking -cons t ant (or translation constant) that can be used to bring the 
difficulty estimates together. Because of the assumptions and properties 
of our measurement modSl, the relationship between the (a) and (b) cells 
should also hold for the (c) arid (d) cells. In order to mairitain the equality 
of these relationships^ we simply add the linking constant, 1.0, to each of 

the estimated abilities of the grade 4 students. The addition of this link- 

■■ 

Ing cbnstarit yields the revised estimates of mean abilities (%') which 

' , 

represents the location of. the mean ability on one unldimensional scale that 



14 



- 12 - 

spans grades 3 and 4. The extension of this Ibgic'^'and procedure to several 

tests over a longer time pieriod is straight forward. 

^THQD 

Item response data from a national sample of greater than 70^000 students 
were obtained from Scott » Foresman and Company. These data were used for 
the standardization and calibration of the Comprehensive ^jss esstnent -Prograffl 
(C. A. P.). The Comprehensive Assessment Program is a coordinated series of 
tests and measures for evaluating students* educational growth. In order to 
accomplish the goal of evaluating educatibhal growth in achievement » equal 
interval scales (EIS scales) were developed using the Rasch model for the 
four substantive areas of reading, mathematics, language and study skills. 

In order to illustrate the application of the Rasch model to the problem 

r __ • __ _ • _ __ ' - •_ .. ■ - - 

of vertical equating, Forms 3A and 4B from the elementary Achievement Series 
was used. There were 14 common items and the independent estimates of the 
difficulties, (along with their standard errors in parentheses) are given in 
columns one and two in Table 1. Thlft;next step is to compute the average 

/ - - - - ' ^M^. - - - ■- - - 

difference in these difficulty estimates which is shown in column three. 
The mean of this difference is 1.22 (standard deviation of .37) which pro- 
vides the preliminary estimate of the linking constant. ^ 

The next task was to assess the fit of the items to the link-. A two- 
step procedure was employed to accomplish this. First, the difficulties 
were plotted and approximate 95% confidence intervals developed. Accor- 
ding to the Rasch model, the plot should define a 45^ line (slope of 1) ^ so 
that a constant (or mean difference) is the only adjustment required. Figure 1 
shows the bivafiate plot of the difficulty estimates for the 14 common itetms. 
The items represented by the black circles are vocabulary items and there^ is 




' - 13 - 

some question about their cohtributlbh to the quaiity of the link. The 
second step was to exakltie the residuals. This residual analysis Is 
suimnarlzed In columns four through six in Table 1* The standardized 
residtjais verify the conclusions drawn from the plot of the difficulties 
that the vocabulary items do not fit as well as the reading comprehension 
items. These standardized residuals are partially inflated due to the 
very small standarH errors of the item difficulties. These standard errors 
are small because of thfe large sample size which provides extremely precise 
difficulty estimates, but tend' to make the statistical tests of fit 
overly sensitive to outliers. A decision rule using the root mean square, 

V o 

which is more robust and less siensitive to outliers, could be developed. 
In practical situations, the decision rule to reject linking -items becomes 
a substantive issue rather than a statistical one. In the present example, 
the four largest standardized residuals were associated with vocabulary 
items. The decision was made to delete these items from the link and the 
computation of the revised linking constant of i.OiS (rbunde'd to 1.02) 
is given in Table 2; 

The final task in developing an eqtial interval scale based on the 
Rasch model is to take the linkihg constants and add them to the ability 
estimates obtained on each form which serves to translate the raw scores 

V 

on each form into the same metric on the latent variable of reading ability. 
Table 3 gives the adjusted ability estimates in Ibgits for the cdrres- 
pdndirig raw scores on each form. Starting with form 2B, the mean ability 
estimates are centered at zero (mean ■» -.006). in order to link scores 
on' form 3A and make them equivalent to ability estimates derived: from 



.form 2B, the^ linking constant of 1.03 is added to the initial ability 
estimates and centered at 1,03 (mean * 1.03). Iri the last column of Table 

,3; form 4B is linked to the other two tests by adding the linking constant 
of 2.05 which is the sum of the link between forms 2B and 3A (1.03), and 

,the link between form 3A arid 4B (1.02). It should be pointed out that 

__ 

the equal interval scale is centered on form 2B, so that the linking 

constants accumulate as we move across the forms. Once the forms are equated 

and a table like Table 3 is constructed^ it is very easy to obtain equivalent 

ability estimates independent of the forms used to provide the estimates. 

. . _ . _ . . _ .. . _ . ^ ■ : _ L [. . . 

In other words, if a student's ability in logits was approximately 1.00* 

we would expect the raw scores of 70 on form 2B, 50 on form 3A, and 30 

on form 4B; j; 

DISCUSSION " 

The Rasch model provides a clear atid practical method for equating 

•7. ' 

educational and pisychological tests. It is the drily equating method based ^ 
dii lateiit trait measurement that can meet the second condition necessary 
in order to equate tests; namely, the sample-free conditions The other 
latent trait models, by including parameters for item discrimination and 
guessing, provide sample-dependent item and person statistics. The specific 
objectivity, which is provided by Rasch measurement models, yields the 
possibility of objective equating. Objective measurement and equating are 
necessary in order to measure student growth in achievement and in order to 
measure educational development. 



The first section of this paper provided an introduction to the Rasch 
xnbdei. in the second section a detaiied iiiustration of the application 
of the Rasch model ^to the problem, of vertical equating was developed. 



Table i. intljsis of item links for equating Pom 3A and Fori liB (Reading). 



item 



Form 
3A 



Form 



Difference 



Residual 
Difference 
(D - 1.22) 



S.E. . 
Residual 



Standardi_zed 

Residual 
z = (D - 1.22)/ Si 



1 












1 


2 






1 ICC 

, 1.155 






flo 
-.Oc 


3 


4,-inf -filing 


'-.^Ji \ .UOO j 


iO(D 


■ Qlili 




li-)i7 










.luo 


•0(5 




5 










•v.i5 




,1 

0 










• UO) 


-i. 1)1 


7 

( • 






fin's 




- -30ft' --V . 






Q 

0 




-.5U0\ •UDD/ 




- ^1^* 

- 1 jlD 


n77 ' 


")i in 


9 


^-.032l*0lilj 


-l.26o|;OTli) 


1.228 


.008 


;085 


;;09 


10 


^.032(.0lti) 


-l.oiii(.OTi) 


.982 


■ -.238 '■ 




. -2.90 


11 , ■ 


i.ii5(-0itd). 


-.T98(.Q6B) 


1.913 






■ ^ ^ b.tt'- 


12 


;26ai.OltO) 


-i.396(.OT6) 


IM 


M.. 


. .086 


5.16 




' .lt8l(.DltO). 


-1.3lO(.OT5) 


1.191 = 


■■■ .5?i ' ■ • 


.085 


5.T2 


ill 


l.28?(.oito) 


-.293(.065) 


, 1.560 


J60 : 


.0T6 





Mean 
S.S. 



■ ^ 



■.?6T' 
Ml 




.000 
.3? 



, -.901 



Tabie 2. 



Name 



Difficvilty 
Difference 



ResldtiiaL 
DiffiBriBnce 



S.E. ; 
Residual 



Standardized. 
Residual 



1 


.971 


-.bU7 


.077 




2 


1.155 


.137 


.079 


■L. 13 


3 


.876 


-.1U2 


.077 


-1.8U 




'i.325 


.308 ' 


.075 = 


U(.12 




.926 


-.092 


.075 


-1.23 


6 


.922 


-.096 


.085 


-i;i3 


7 


.892 


-.126 


.085- 


-i.U8 


8 - 


.90U 


^.lllf 


.077^ " 


-~-l.U8 


9 . , 


1.228 


.210 


.085 


2.U7 


10 


. .982 


-.'036 ■ 


• .082 





Mean 
S.D. 




.000 
.16 ^ 



.01 
2.03 



^- - _ - \ 

Table 3. Adjusted ability estimates in. iosi^^^ parentheses} 
• ' **'for raw scores on 'reading tests. Forms 2B through Ub. 



_rAw_ 

SCORE 
1 

5 
IQ 
15 
26 

'23 ; 
30 

35 
UO 

k3 
50 
55 
60 

65 
70- 

75 ^ 
80 

85 
90" 
95 
99 



Mean 
S.D. 



Form 
2B 

-5.20(1.03) 

-2.6li .36) 
-2.06( .31) 
-l,63( -.28) 
-1.28( .26) 
-.97( .2U) 
-.69( .23) 
-.hSi .23) 
-.18( .22) 
.06( .22) 
.29( .22) 
.53( .22)" 
.78(. .23) 
1.0U( .23) 
1.32( .2U) 
l.6lf(-.26) 
2.01( .29) 
2.50( .3lt) 
. 3.28(,.U7) 
- U.9U(1.0^) 

— — -— — 
-.006 

' 2.32 • 



Form 
3A 

-3.83(1.01) 
-2.13( .1*7) 
-1.39( -Sh) 
-.90( .29) 
-.53( .26) 
-.21( .2U) 
.0T( .23), 
.32( .22) 
.56( .22) 
.79( «22) 
i.02( .21) 
1.25.( .22) 
^.it8( .22) 
1.73( .22) 
1.99( .23) 
^2.27( ,25) 
2.59{ .-26) 
. 2.97( \29) 
3'.1*7( .35) 
lj.25( M) 
5.92(1.01) 



Form 

-2;98(l.0i) 
-1.29( .U?) . 
-.U9( M) 
.03( .30) 
.U2(-, .^7) 

.75( -25 r 

1.05(' .2k) 

1.32( .23) 

1.57( .22) 

1.82( .22) 

^2;06( -_22) 

^.30( .22) 

2.5k^ .22) 

2.8S( .23) 

3.06( .2ky 

■ 3.35(: .25) 
. -3.68( .27) 

k.oii .30) 

U.57( .35) 
5.35{ .^T) ' 
7.03(1.01) . 



1.03 
2.^2 



2.05 
2.30 



ERIC 




ERIC 



